{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "016b5598",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/RedisIndexDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0b692c73",
   "metadata": {},
   "source": [
    "# Redis Vector Store"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1e7787c2",
   "metadata": {},
   "source": [
    "In this notebook we are going to show a quick demo of using the RedisVectorStore."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c479ce87",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1730d643",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -U llama-index llama-index-vector-stores-redis llama-index-embeddings-cohere llama-index-embeddings-openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47264e32",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "import sys\n",
    "import logging\n",
    "import textwrap\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "# Uncomment to see debug logs\n",
    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
    "\n",
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
    "from llama_index.vector_stores.redis import RedisVectorStore"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3c692310",
   "metadata": {},
   "source": [
    "### Start Redis\n",
    "\n",
    "The easiest way to start Redis is using the [Redis Stack](https://hub.docker.com/r/redis/redis-stack) docker image or\n",
    "quickly signing up for a [FREE Redis Cloud](https://redis.com/try-free) instance.\n",
    "\n",
    "To follow every step of this tutorial, launch the image as follows:\n",
    "\n",
    "```bash\n",
    "docker run --name redis-vecdb -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest\n",
    "```\n",
    "\n",
    "This will also launch the RedisInsight UI on port 8001 which you can view at http://localhost:8001.\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f9b97a89",
   "metadata": {},
   "source": [
    "### Setup OpenAI\n",
    "Lets first begin by adding the openai api key. This will allow us to access openai for embeddings and to use chatgpt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c9f4d21-145a-401e-95ff-ccb259e8ef84",
   "metadata": {},
   "outputs": [],
   "source": [
    "oai_api_key = getpass.getpass(\"OpenAI API Key:\")\n",
    "os.environ[\"OPENAI_API_KEY\"] = oai_api_key"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "103ff054",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "304ad9d8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-04-10 19:35:33--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8003::154, 2606:50c0:8000::154, 2606:50c0:8002::154, ...\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8003::154|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 75042 (73K) [text/plain]\n",
      "Saving to: ‘data/paul_graham/paul_graham_essay.txt’\n",
      "\n",
      "data/paul_graham/pa 100%[===================>]  73.28K  --.-KB/s    in 0.03s   \n",
      "\n",
      "2024-04-10 19:35:33 (2.15 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "59ff935d",
   "metadata": {},
   "source": [
    "### Read in a dataset\n",
    "Here we will use a set of Paul Graham essays to provide the text to turn into embeddings, store in a ``RedisVectorStore`` and query to find context for our LLM QnA loop."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: 7056f7ba-3513-4ef4-9792-2bd28040aaed Document Filename: paul_graham_essay.txt\n"
     ]
    }
   ],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()\n",
    "print(\n",
    "    \"Document ID:\",\n",
    "    documents[0].id_,\n",
    "    \"Document Filename:\",\n",
    "    documents[0].metadata[\"file_name\"],\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "dd270925",
   "metadata": {},
   "source": [
    "### Initialize the default Redis Vector Store\n",
    "\n",
    "Now we have our documents prepared, we can initialize the Redis Vector Store with **default** settings. This will allow us to store our vectors in Redis and create an index for real-time search."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba1558b3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:39:17 llama_index.vector_stores.redis.base INFO   Using default RedisVectorStore schema.\n",
      "19:39:19 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "19:39:19 llama_index.vector_stores.redis.base INFO   Added 22 documents to index llama_index\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core import StorageContext\n",
    "from redis import Redis\n",
    "\n",
    "# create a Redis client connection\n",
    "redis_client = Redis.from_url(\"redis://localhost:6379\")\n",
    "\n",
    "# create the vector store wrapper\n",
    "vector_store = RedisVectorStore(redis_client=redis_client, overwrite=True)\n",
    "\n",
    "# load storage context\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "\n",
    "# build and load index from documents and storage context\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")\n",
    "# index = VectorStoreIndex.from_vector_store(vector_store=vector_store)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc00b3fb",
   "metadata": {},
   "source": [
    "### Query the default vector store\n",
    "\n",
    "Now that we have our data stored in the index, we can ask questions against the index.\n",
    "\n",
    "The index will use the data as the knowledge base for an LLM. The default setting for as_query_engine() utilizes OpenAI embeddings and GPT as the language model. Therefore, an OpenAI key is required unless you opt for a customized or local language model.\n",
    "\n",
    "Below we will test searches against out index and then full RAG with an LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c50a593f",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "retriever = index.as_retriever()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3f0daf7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:39:22 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "19:39:22 llama_index.vector_stores.redis.base INFO   Querying index llama_index with filters *\n",
      "19:39:22 llama_index.vector_stores.redis.base INFO   Found 2 results for query with id ['llama_index/vector_adb6b7ce-49bb-4961-8506-37082c02a389', 'llama_index/vector_e39be1fe-32d0-456e-b211-4efabd191108']\n",
      "Node ID: adb6b7ce-49bb-4961-8506-37082c02a389\n",
      "Text: What I Worked On  February 2021  Before college the two main\n",
      "things I worked on, outside of school, were writing and programming. I\n",
      "didn't write essays. I wrote what beginning writers were supposed to\n",
      "write then, and probably still are: short stories. My stories were\n",
      "awful. They had hardly any plot, just characters with strong feelings,\n",
      "which I ...\n",
      "Score:  0.820\n",
      "\n",
      "Node ID: e39be1fe-32d0-456e-b211-4efabd191108\n",
      "Text: Except for a few officially anointed thinkers who went to the\n",
      "right parties in New York, the only people allowed to publish essays\n",
      "were specialists writing about their specialties. There were so many\n",
      "essays that had never been written, because there had been no way to\n",
      "publish them. Now they could be, and I was going to write them. [12]\n",
      "I've wor...\n",
      "Score:  0.819\n",
      "\n"
     ]
    }
   ],
   "source": [
    "result_nodes = retriever.retrieve(\"What did the author learn?\")\n",
    "for node in result_nodes:\n",
    "    print(node)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e13d7726",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:39:25 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "19:39:25 llama_index.vector_stores.redis.base INFO   Querying index llama_index with filters *\n",
      "19:39:25 llama_index.vector_stores.redis.base INFO   Found 2 results for query with id ['llama_index/vector_adb6b7ce-49bb-4961-8506-37082c02a389', 'llama_index/vector_e39be1fe-32d0-456e-b211-4efabd191108']\n",
      "19:39:27 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "The author learned that working on things that weren't prestigious often led to valuable discoveries\n",
      "and indicated the right kind of motives. Despite the lack of initial prestige, pursuing such work\n",
      "could be a sign of genuine potential and appropriate motivations, steering clear of the common\n",
      "pitfall of being driven solely by the desire to impress others.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What did the author learn?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b99b79b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:39:27 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "19:39:27 llama_index.vector_stores.redis.base INFO   Querying index llama_index with filters *\n",
      "19:39:27 llama_index.vector_stores.redis.base INFO   Found 2 results for query with id ['llama_index/vector_adb6b7ce-49bb-4961-8506-37082c02a389', 'llama_index/vector_e39be1fe-32d0-456e-b211-4efabd191108']\n",
      "Node ID: adb6b7ce-49bb-4961-8506-37082c02a389\n",
      "Text: What I Worked On  February 2021  Before college the two main\n",
      "things I worked on, outside of school, were writing and programming. I\n",
      "didn't write essays. I wrote what beginning writers were supposed to\n",
      "write then, and probably still are: short stories. My stories were\n",
      "awful. They had hardly any plot, just characters with strong feelings,\n",
      "which I ...\n",
      "Score:  0.802\n",
      "\n",
      "Node ID: e39be1fe-32d0-456e-b211-4efabd191108\n",
      "Text: Except for a few officially anointed thinkers who went to the\n",
      "right parties in New York, the only people allowed to publish essays\n",
      "were specialists writing about their specialties. There were so many\n",
      "essays that had never been written, because there had been no way to\n",
      "publish them. Now they could be, and I was going to write them. [12]\n",
      "I've wor...\n",
      "Score:  0.799\n",
      "\n"
     ]
    }
   ],
   "source": [
    "result_nodes = retriever.retrieve(\"What was a hard moment for the author?\")\n",
    "for node in result_nodes:\n",
    "    print(node)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c0838ee1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:39:29 httpx INFO   HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "19:39:29 llama_index.vector_stores.redis.base INFO   Querying index llama_index with filters *\n",
      "19:39:29 llama_index.vector_stores.redis.base INFO   Found 2 results for query with id ['llama_index/vector_adb6b7ce-49bb-4961-8506-37082c02a389', 'llama_index/vector_e39be1fe-32d0-456e-b211-4efabd191108']\n",
      "19:39:31 httpx INFO   HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
      "A hard moment for the author was when one of his programs on the IBM 1401 mainframe didn't\n",
      "terminate, leading to a technical error and an uncomfortable situation with the data center manager.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\"What was a hard moment for the author?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba33eb01",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:39:34 llama_index.vector_stores.redis.base INFO   Deleting index llama_index\n"
     ]
    }
   ],
   "source": [
    "index.vector_store.delete_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "831452c8",
   "metadata": {},
   "source": [
    "### Use a custom index schema\n",
    "\n",
    "In most use cases, you need the ability to customize the underling index configuration\n",
    "and specification. For example, this is handy in order to define specific metadata filters you wish to enable.\n",
    "\n",
    "With Redis, this is as simple as defining an index schema object\n",
    "(from file or dict) and passing it through to the vector store client wrapper.\n",
    "\n",
    "For this example, we will:\n",
    "1. switch the embedding model to [Cohere](cohereai.com)\n",
    "2. add an additional metadata field for the document `updated_at` timestamp\n",
    "3. index the existing `file_name` metadata field"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2022e92a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.settings import Settings\n",
    "from llama_index.embeddings.cohere import CohereEmbedding\n",
    "\n",
    "# set up Cohere Key\n",
    "co_api_key = getpass.getpass(\"Cohere API Key:\")\n",
    "os.environ[\"CO_API_KEY\"] = co_api_key\n",
    "\n",
    "# set llamaindex to use Cohere embeddings\n",
    "Settings.embed_model = CohereEmbedding()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c07e9747",
   "metadata": {},
   "outputs": [],
   "source": [
    "from redisvl.schema import IndexSchema\n",
    "\n",
    "\n",
    "custom_schema = IndexSchema.from_dict(\n",
    "    {\n",
    "        # customize basic index specs\n",
    "        \"index\": {\n",
    "            \"name\": \"paul_graham\",\n",
    "            \"prefix\": \"essay\",\n",
    "            \"key_separator\": \":\",\n",
    "        },\n",
    "        # customize fields that are indexed\n",
    "        \"fields\": [\n",
    "            # required fields for llamaindex\n",
    "            {\"type\": \"tag\", \"name\": \"id\"},\n",
    "            {\"type\": \"tag\", \"name\": \"doc_id\"},\n",
    "            {\"type\": \"text\", \"name\": \"text\"},\n",
    "            # custom metadata fields\n",
    "            {\"type\": \"numeric\", \"name\": \"updated_at\"},\n",
    "            {\"type\": \"tag\", \"name\": \"file_name\"},\n",
    "            # custom vector field definition for cohere embeddings\n",
    "            {\n",
    "                \"type\": \"vector\",\n",
    "                \"name\": \"vector\",\n",
    "                \"attrs\": {\n",
    "                    \"dims\": 1024,\n",
    "                    \"algorithm\": \"hnsw\",\n",
    "                    \"distance_metric\": \"cosine\",\n",
    "                },\n",
    "            },\n",
    "        ],\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22184dd0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "IndexInfo(name='paul_graham', prefix='essay', key_separator=':', storage_type=<StorageType.HASH: 'hash'>)"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "custom_schema.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2bf50ab5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id': TagField(name='id', type='tag', path=None, attrs=TagFieldAttributes(sortable=False, separator=',', case_sensitive=False, withsuffixtrie=False)),\n",
       " 'doc_id': TagField(name='doc_id', type='tag', path=None, attrs=TagFieldAttributes(sortable=False, separator=',', case_sensitive=False, withsuffixtrie=False)),\n",
       " 'text': TextField(name='text', type='text', path=None, attrs=TextFieldAttributes(sortable=False, weight=1, no_stem=False, withsuffixtrie=False, phonetic_matcher=None)),\n",
       " 'updated_at': NumericField(name='updated_at', type='numeric', path=None, attrs=NumericFieldAttributes(sortable=False)),\n",
       " 'file_name': TagField(name='file_name', type='tag', path=None, attrs=TagFieldAttributes(sortable=False, separator=',', case_sensitive=False, withsuffixtrie=False)),\n",
       " 'vector': HNSWVectorField(name='vector', type='vector', path=None, attrs=HNSWVectorFieldAttributes(dims=1024, algorithm=<VectorIndexAlgorithm.HNSW: 'HNSW'>, datatype=<VectorDataType.FLOAT32: 'FLOAT32'>, distance_metric=<VectorDistanceMetric.COSINE: 'COSINE'>, initial_cap=None, m=16, ef_construction=200, ef_runtime=10, epsilon=0.01))}"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "custom_schema.fields"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b05ebd97",
   "metadata": {},
   "source": [
    "Learn more about [schema and index design](https://redisvl.com) with redis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61b01276",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "\n",
    "\n",
    "def date_to_timestamp(date_string: str) -> int:\n",
    "    date_format: str = \"%Y-%m-%d\"\n",
    "    return int(datetime.strptime(date_string, date_format).timestamp())\n",
    "\n",
    "\n",
    "# iterate through documents and add new field\n",
    "for document in documents:\n",
    "    document.metadata[\"updated_at\"] = date_to_timestamp(\n",
    "        document.metadata[\"last_modified_date\"]\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e871823e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:40:05 httpx INFO   HTTP Request: POST https://api.cohere.ai/v1/embed \"HTTP/1.1 200 OK\"\n",
      "19:40:06 httpx INFO   HTTP Request: POST https://api.cohere.ai/v1/embed \"HTTP/1.1 200 OK\"\n",
      "19:40:06 httpx INFO   HTTP Request: POST https://api.cohere.ai/v1/embed \"HTTP/1.1 200 OK\"\n",
      "19:40:06 llama_index.vector_stores.redis.base INFO   Added 22 documents to index paul_graham\n"
     ]
    }
   ],
   "source": [
    "vector_store = RedisVectorStore(\n",
    "    schema=custom_schema,  # provide customized schema\n",
    "    redis_client=redis_client,\n",
    "    overwrite=True,\n",
    ")\n",
    "\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "\n",
    "# build and load index from documents and storage context\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3791a32c",
   "metadata": {},
   "source": [
    "### Query the vector store and filter on metadata\n",
    "Now that we have additional metadata indexed in Redis, let's try some queries with filters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb2c21ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.vector_stores import (\n",
    "    MetadataFilters,\n",
    "    MetadataFilter,\n",
    "    ExactMatchFilter,\n",
    ")\n",
    "\n",
    "retriever = index.as_retriever(\n",
    "    similarity_top_k=3,\n",
    "    filters=MetadataFilters(\n",
    "        filters=[\n",
    "            ExactMatchFilter(key=\"file_name\", value=\"paul_graham_essay.txt\"),\n",
    "            MetadataFilter(\n",
    "                key=\"updated_at\",\n",
    "                value=date_to_timestamp(\"2023-01-01\"),\n",
    "                operator=\">=\",\n",
    "            ),\n",
    "            MetadataFilter(\n",
    "                key=\"text\",\n",
    "                value=\"learn\",\n",
    "                operator=\"text_match\",\n",
    "            ),\n",
    "        ],\n",
    "        condition=\"and\",\n",
    "    ),\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d136cfb3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:40:22 httpx INFO   HTTP Request: POST https://api.cohere.ai/v1/embed \"HTTP/1.1 200 OK\"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:40:22 llama_index.vector_stores.redis.base INFO   Querying index paul_graham with filters ((@file_name:{paul_graham_essay\\.txt} @updated_at:[1672549200 +inf]) @text:(learn))\n",
      "19:40:22 llama_index.vector_stores.redis.base INFO   Found 3 results for query with id ['essay:0df3b734-ecdb-438e-8c90-f21a8c80f552', 'essay:01108c0d-140b-4dcc-b581-c38b7df9251e', 'essay:ced36463-ac36-46b0-b2d7-935c1b38b781']\n",
      "Node ID: 0df3b734-ecdb-438e-8c90-f21a8c80f552\n",
      "Text: All that seemed left for philosophy were edge cases that people\n",
      "in other fields felt could safely be ignored.  I couldn't have put\n",
      "this into words when I was 18. All I knew at the time was that I kept\n",
      "taking philosophy courses and they kept being boring. So I decided to\n",
      "switch to AI.  AI was in the air in the mid 1980s, but there were two\n",
      "things...\n",
      "Score:  0.410\n",
      "\n",
      "Node ID: 01108c0d-140b-4dcc-b581-c38b7df9251e\n",
      "Text: It was not, in fact, simply a matter of teaching SHRDLU more\n",
      "words. That whole way of doing AI, with explicit data structures\n",
      "representing concepts, was not going to work. Its brokenness did, as\n",
      "so often happens, generate a lot of opportunities to write papers\n",
      "about various band-aids that could be applied to it, but it was never\n",
      "going to get us ...\n",
      "Score:  0.390\n",
      "\n",
      "Node ID: ced36463-ac36-46b0-b2d7-935c1b38b781\n",
      "Text: Grad students could take classes in any department, and my\n",
      "advisor, Tom Cheatham, was very easy going. If he even knew about the\n",
      "strange classes I was taking, he never said anything.  So now I was in\n",
      "a PhD program in computer science, yet planning to be an artist, yet\n",
      "also genuinely in love with Lisp hacking and working away at On Lisp.\n",
      "In other...\n",
      "Score:  0.389\n",
      "\n"
     ]
    }
   ],
   "source": [
    "result_nodes = retriever.retrieve(\"What did the author learn?\")\n",
    "\n",
    "for node in result_nodes:\n",
    "    print(node)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8c8849ba",
   "metadata": {},
   "source": [
    "### Restoring from an existing index in Redis\n",
    "Restoring from an index requires a Redis connection client (or URL), `overwrite=False`, and passing in the same schema object used before. (This can be offloaded to a YAML file for convenience using `.to_yaml()`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6792f189",
   "metadata": {},
   "outputs": [],
   "source": [
    "custom_schema.to_yaml(\"paul_graham.yaml\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95817a85",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:40:28 redisvl.index.index INFO   Index already exists, not overwriting.\n"
     ]
    }
   ],
   "source": [
    "vector_store = RedisVectorStore(\n",
    "    schema=IndexSchema.from_yaml(\"paul_graham.yaml\"),\n",
    "    redis_client=redis_client,\n",
    ")\n",
    "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82ea32aa",
   "metadata": {},
   "source": [
    "**In the near future** -- we will implement a convenience method to load just using an index name:\n",
    "```python\n",
    "RedisVectorStore.from_existing_index(index_name=\"paul_graham\", redis_client=redis_client)\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "52b975a7",
   "metadata": {},
   "source": [
    "### Deleting documents or index completely\n",
    "\n",
    "Sometimes it may be useful to delete documents or the entire index. This can be done using the `delete` and `delete_index` methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6fe322f7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'7056f7ba-3513-4ef4-9792-2bd28040aaed'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "document_id = documents[0].doc_id\n",
    "document_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ce45788",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of documents before deleting 22\n",
      "19:40:32 llama_index.vector_stores.redis.base INFO   Deleted 22 documents from index paul_graham\n",
      "Number of documents after deleting 0\n"
     ]
    }
   ],
   "source": [
    "print(\"Number of documents before deleting\", redis_client.dbsize())\n",
    "vector_store.delete(document_id)\n",
    "print(\"Number of documents after deleting\", redis_client.dbsize())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "442e8acf",
   "metadata": {},
   "source": [
    "However, the Redis index still exists (with no associated documents) for continuous upsert."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12eda458",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vector_store.index_exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c380605a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "19:40:37 llama_index.vector_stores.redis.base INFO   Deleting index paul_graham\n"
     ]
    }
   ],
   "source": [
    "# now lets delete the index entirely\n",
    "# this will delete all the documents and the index\n",
    "vector_store.delete_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "474ad4ee",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of documents after deleting 0\n"
     ]
    }
   ],
   "source": [
    "print(\"Number of documents after deleting\", redis_client.dbsize())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "07514f85",
   "metadata": {},
   "source": [
    "### Troubleshooting\n",
    "\n",
    "If you get an empty query result, there a couple of issues to check:\n",
    "\n",
    "#### Schema\n",
    "\n",
    "Unlike other vector stores, Redis expects users to explicitly define the schema for the index. This is for a few reasons:\n",
    "1. Redis is used for many use cases, including real-time vector search, but also for standard document storage/retrieval, caching, messaging, pub/sub, session mangement, and more. Not all attributes on records need to be indexed for search. This is partially an efficiency thing, and partially an attempt to minimize user foot guns.\n",
    "2. All index schemas, when using Redis & LlamaIndex, must include the following fields `id`, `doc_id`, `text`, and `vector`, at a minimum.\n",
    "\n",
    "Instantiate your `RedisVectorStore` with the default schema (assumes OpenAI embeddings), or with a custom schema (see above).\n",
    "\n",
    "#### Prefix issues\n",
    "\n",
    "Redis expects all records to have a key prefix that segments the keyspace into \"partitions\"\n",
    "for potentially different applications, use cases, and clients.\n",
    "\n",
    "Make sure that the chosen `prefix`, as part of the index schema, is consistent across your code (tied to a specific index).\n",
    "\n",
    "To see what prefix your index was created with, you can run `FT.INFO <name of your index>` in the Redis CLI and look under `index_definition` => `prefixes`.\n",
    "\n",
    "#### Data vs Index\n",
    "Redis treats the records in the dataset and the index as different entities. This allows you more flexibility in performing updates, upserts, and index schema migrations.\n",
    "\n",
    "If you have an existing index and want to make sure it's dropped, you can run `FT.DROPINDEX <name of your index>` in the Redis CLI. Note that this will *not* drop your actual data unless you pass `DD`\n",
    "\n",
    "#### Empty queries when using metadata\n",
    "\n",
    "If you add metadata to the index *after* it has already been created and then try to query over that metadata, your queries will come back empty.\n",
    "\n",
    "Redis indexes fields upon index creation only (similar to how it indexes the prefixes, above)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
